Allow nativeparse to parse source code directly#21260
Conversation
|
Current CI failure is due to changed typing signature of |
This comment has been minimized.
This comment has been minimized.
c8c10dd to
ac275e4
Compare
This comment has been minimized.
This comment has been minimized.
444f4e9 to
149e459
Compare
for more information, see https://pre-commit.ci
This comment has been minimized.
This comment has been minimized.
|
CI failures:
Is it possible for CI to run a non-released version of |
This comment has been minimized.
This comment has been minimized.
Resolves #21 Tests are part of python/mypy#21260
This is mostly needed for #21260
ilevkivskyi
left a comment
There was a problem hiding this comment.
I have one comment for now. Also it looks like parallel type-checking is somehow broken by this.
This comment has been minimized.
This comment has been minimized.
|
Line 31 in 488a646 - if options.native_parser:
+ if options.native_parser and source:
Oops, "parallel checking" would |
|
Hint: |
|
Btw, I added some logging, and it looks like we sometimes pass a non-empty |
|
Yeah, we eagerly read the file if there is only one file in the parse batch. Anyway, no need to fix it in this PR since this is a pre-existing problem, you can just fix the crash by passing an actual source (which should be |
This comment has been minimized.
This comment has been minimized.
| if not os.path.exists(path): | ||
| build_error( | ||
| "Cannot read file '{}': {}".format( | ||
| path.replace(os.getcwd() + os.sep, ""), | ||
| os.strerror(2), # `errno.ENOENT` | ||
| ) | ||
| ) |
There was a problem hiding this comment.
This is temporary, I plan on making ast_serialize surface OSError instead in a follow-up.
|
@ilevkivskyi Appreciate the pointers. Some commentary: |
ilevkivskyi
left a comment
There was a problem hiding this comment.
This is not ready.
In general, I feel like ability to parse source should simplify things, not vice versa. A bunch of the logic was added solely for the purpose of diverting to the old parser in cases where there is no file.
Let's give this one more iteration (or I can simply do this myself).
| path.replace(os.getcwd() + os.sep, ""), | ||
| os.strerror(2), # `errno.ENOENT` | ||
| ) | ||
| ) |
There was a problem hiding this comment.
I don't think this is a good place for this check. This is executed in a thread, instead it should be done before parsing, to match existing logic.
This reverts commit 2a523b5.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| if post_parse: | ||
| self.post_parse_all(states) | ||
| # This duplicates a bit of logic from State.parse_file(). This is done to | ||
| # optimize handling of states parsed in parallel. |
There was a problem hiding this comment.
I've just copied the previous contents of def parse_parallel straight here, as I don't think State.parse_file() can be refactored very simply so that parallel parsing uses the same logic, even with removing the previous sequential states handling.
| # Handle fake `__init__.py` files due to `--package-root` | ||
| if ( | ||
| (source is None) | ||
| and (os.path.dirname(path) in self.fscache.fake_package_cache) | ||
| and (os.path.basename(path) == "__init__.py") | ||
| ): | ||
| source = "" |
There was a problem hiding this comment.
Substitutes previous handling of file_exists = self.fscache.exists(path, real_only=True) in the same method.
There was a problem hiding this comment.
I think you should not need this if you follow my suggestion in the first comment above.
ilevkivskyi
left a comment
There was a problem hiding this comment.
OK, this is moving in the right direction. I have few more comments.
| state.xpath.replace(os.getcwd() + os.sep, ""), | ||
| os.strerror(2), # `errno.ENOENT` | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Hm, this is a bit annoying. I guess it is better to keep real_only parameter, then you will be able to write here:
if not self.fscache.exists(state.xpath, real_only=True):
state.source = state.get_source()This way you will not need this, and also will be able to remove the ugly check below.
There was a problem hiding this comment.
I've generally avoided trying fixes that involved mutating state.source = outside of methods of State, but if that's ok, I've applied the suggestion.
There was a problem hiding this comment.
Yes, it is OK in this case as it will only affect fake/synthetic files.
| sequential_states, parallel_states | ||
| ) | ||
| for state in parallel_parsed_states: | ||
| # New parser only returns serialized ASTs |
There was a problem hiding this comment.
You modified this comment while copying, why?
There was a problem hiding this comment.
Original reason was because parallelize only those parts of the code that can be parallelized efficiently., to me, reads out of context when parse_parallel no longer handles a variable called sequential_states. But I've restored the original comment as-is.
There was a problem hiding this comment.
It is not really about sequential states, it is about the general logic: in parse_file() we do (roughly): pre-parse, parse, post-parse. In parse_all() we do: pre-parse sequentially, parse in parallel, post-parse sequentially. This is done like this to avoid overhead of context switches in code that holds the GIL (pre-parse and post-parse).
| # Handle fake `__init__.py` files due to `--package-root` | ||
| if ( | ||
| (source is None) | ||
| and (os.path.dirname(path) in self.fscache.fake_package_cache) | ||
| and (os.path.basename(path) == "__init__.py") | ||
| ): | ||
| source = "" |
There was a problem hiding this comment.
I think you should not need this if you follow my suggestion in the first comment above.
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
|
@bzoracler If you are interested in working more in this direction, I think #21222 and #21514 are good next issues. (Btw it could make sense to fix them in the same PR, as they are somewhat related). #21515 is another important thing, but it is waiting on input from @JukkaL |
|
I'll take a look at #21222 and #21514 in the next few days. #21515 looks quite involved so I'll pass on this for now, but I'm keen on seeing it resolved as it looks quite useful for plugins*, so if the work is green-lighted I may submit a PR in the next few weeks if I don't see any work done on it already. *Ruff parser allows parsing string forward expressions with leading spaces out-of-the-box, Python's |
This is the mypy counterpart of mypyc/ast_serialize#54